[Linux] Shell编程第十六课

N: 将数据流中的下一行加进来创建一个多行组来处理

D: 删除多行组中的一行

P: 打印多行组中的一行

小写的n命令告诉sed编辑器移动到数据流中的下一文本行，而不用重新回到命令的最开始再执行一遍，通常sed编辑器在移动到数据流的下一文本行之前，会在当前行上执行完所有定义好的命令，单行next命令改变了这个流程。

/^$/表示匹配空白行

单行的next命令会将数据流中的下一文本行移动到sed编辑器的工作空间（称为模式空间），多行版本的next命令（大写N）会将下一文本行添加到模式空间中已有的文本后。这样的作用是将数据流中的两个文本行合并到同一个模式空间中，文本行仍然用换行符分隔，但sed编辑器现在会将两行文本当成一行来处理：

$ nano data2.txt
This is the header line.
This is the first data line.
This is the second data line.
This is the last line.
$ 
$ sed '/first/{N; s/\n/ /}' data2.txt
This is the header line.
This is the first data line. This is the second data line.
This is the last line.
$

如果要在数据文件中查找一个可能会分散在两行中的文本短语的话，这是个很实用的应用程序：

$ nano data3.txt
On Tuesday, the Linux System
Administrator's group meeting will be held.
All System Administrators should attend.
Thank you for your attendance.
$
$ sed 'N; s/System Administrator/Desktop User/' data3.txt
On Tuesday, the Linux System
Administrator's group meeting will be held.
All Desktop Users should attend.
Thank you for your attendance.
$

替换命令会在文本文件中查找特定的双词短语System Administrator，上面这段程序中虽然使用了N命令，但是因为第一行和第二行之间有回车符所以无法识别，需要使用下面的方式：

$ sed 'N ; s/System.Administrator/Desktop User/' data3.txt
On Tuesday, the Linux Desktop User's group meeting will be held.
All Desktop Users should attend.
Thank you for your attendance.
$

需要注意的是上面这段程序中替换命令在System和Administrator之间使用了通配符模式（.）来匹配空格和换行符这两种情况，但当它匹配了换行符时，他就从字符串中删除了换行符，导致两行合并成一行，要解决这种情况，可以在sed编辑器脚本中用两个替换命令：一个用来匹配短语出现在行中的情况，一个用来匹配短语出现在单行中的情况：

1	$ sed 'N; s/System\nAdministrator/Desktop\nUser/; s/System Administrator/Desktop User/' data3.txt

但是这个脚本还有一个问题，这个脚本总是在执行sed编辑器命令前将下一行文本读入到模式空间，当它到了最后一行文本时，就没有下一行可读了，所以N命令会叫sed编辑器停止，如果要匹配的文本正好在数据流的最后一行，命令就不会发现要匹配的数据，解决方式是将单行命令放到N命令前面，并将多行命令放到N命令后面;

1	$ sed 's/System Administrator/Desktop User/; N; s/System\nAdministrator/Desktop\nUser/' data3.txt

单行删除命令d在和N一起使用时如果匹配成功，会将模式空间中两行同时删除，sed编辑器提供了多行删除命令D，它只删除模式空间中的第一行，该命令会删除到换行符（含换行符）为止的所有字符：
1
2
3
4
$ sed 'N; /System\nAdministrator/D' data4.txt
Administrator's group meeting will be held.
All System Administrators should attend.
$
如果需要删除目标数据字符串所在行的前一文本行，这个命令很有用处，这里有个例子，它会删除数据流中出现在第一行前的空白行：
1
2
3
4
5
6
7
8
9
10
11
12
$ nano data5.txt
This is the header line.
This is a data line.
This is the last line.
$
$ sed '/^$/{N; /header/D}' data5.txt
This is the header line.
This is a data line.
This is the last line.
sed编辑器会查找空白行，然后用N命令来将下一文本行添加到模式空间，如果新的模式空间内内容含有单词header，则D命令会删除模式空间中的第一行。

多行打印命令只打引多行模式空间中的第一行，这包括模式空间中直到换行符为止的所有字符

需要的是D命令会强制sed编辑器返回到脚本的起始处，对同一模式空间中的内容重新执行这些命令（它不会从数据流中读取新的文本行）

模式空间是一块活跃的缓冲区，在sed编辑器执行命令时它会保存待检查的文本，sed编辑器有另一块称作保持空间的缓冲区域，在处理模式空间中的某些行时，可以用保持空间来临时保存一些行，有5条命令可用来操作保持空间空间：

命令描述

h 将模式空间复制到保持空间

H 将模式空间附加到保持空间

g 将保持空间复制到模式空间

G 将保持空间附加到模式空间

x 交换保持空间和模式空间的内容

$ nano data2.txt
This is the header line.
This is the first data line.
This is the second data line.
This is the last line.
$
$ sed -n '/first/ {h;p;n;p;g;p}' data2.txt
This is the first data line.
This is the second data line.
This is the first data line.
$

感叹号命令(!)用来排除命令，也就是让原本会起作用的命令不起作用：

$ sed -n '/header/!p' data2.txt
This is the first data line.
This is the second data line.
This is the last line.

除了包含单词header那一行之外，文件中其他所有的行都被打印出来了。

1	$ sed 'N; s/System\nAdministrator/Desktop\nUser/; s/System Administrator/Desktop User/' data4.txt

1
2
3

On Tuesday, the Linux Desktop
User's group meeting will be held.
All System Administrators should attend.

1	$ sed '$!N; s/System\nAdministrator/Desktop\nUser/; s/System Administrator/Desktop User/' data4.txt

1
2
3

On Tuesday, the Linux Desktop
User's group meeting will be held.
All Desktop Users should attend.

上面这段程序中，美元符表示数据流中的最后一行文本，所以当sed编辑器到了最后一行时，它没有执行N命令，但它对其他行都执行了这个命令，使用这种方法，你可以反转数据流中文本行的顺序。流程如下：

在模式空间中放置一行;
将模式空间中的行放到保持空间中;
在模式空间中放入下一行;
将保持空间附加到模式空间后;
将模式空间中的所有内容都放到保持空间中;
重复执行第(3)~(5)步,直到所有行都反序放到了保持空间中;
提取并打印行。

$ nano data2.txt
This is the header line.
This is the first data line.
This is the second data line.
This is the last line.
$
$ sed -n '{1!G; h; $p}' data2.txt
This is the last line.
This is the second data line.
This is the first data line.
This is the header line.
$

当然，Linux命令tac也有反转文本文件的功能，tac命令会倒序显示一个文本文件，它执行的正好是和cat命令相反的功能。

sed提供了一种方法，可以基于地址、地址模式或地址空间排除一整块命令，这允许你只对数据流中的特定行执行一组命令。

分支命令b的格式如下：[address]b [label]

address参数决定了哪些行的数据会触发分支命令，label参数定义了要跳转到的位置，如果没有加label参数，跳转命令会跳转到脚本的结尾：

$ nano data2.txt
This is the header line.
This is the first data line.
This is the second data line.
This is the last line.
$
$ sed '{2,3b; s/This is/Is this/; s/line./test?/}' data2.txt
Is this the header test?
This is the first data line.
This is the second data line.
Is this the last test?
$

分支命令在数据流中的第2行和第3行处跳过了两个替换命令，要是不想直接跳到脚本的结尾，可以为分支命令定义一个要跳转到的标签，标签以冒号开始，最多可以是7个字符长度:label2

要指定标签，将它加到b命令后即可，使用标签允许你跳过地址匹配处的命令，但仍然执行脚本中的其他命令。

$ sed '{/first/b jump1; s/This is the/No jump on/; :jump1; s/This is the/Jump here on/}' data2.txt
No jump on header line
Jump here on first data line
No jump on second data line
No jump on last line
$

下面这个例子演示了跳转到sed脚本中靠前面的标签上，这样就达到了循环的效果：

$ echo "This, is, a, test, to, remove, commas." | sed -n '{:start; s/,//1p; b start;}'
This is, a, test, to, remove, commas.
This is a, test, to, remove, commas.
This is a test, to, remove, commas.
This is a test to, remove, commas.
This is a test to remove, commas.
This is a test to remove commas.
^C
$

这个脚本有一个问题它会不停的寻找逗号陷入了一个死循环中，可以这样解决，为分支命令指定一个地址模式来查找：

$ echo "This, is, a, test, to, remove, commas." | sed -n '{
> :start
> s/,//1p
> /,/b start
> }'
This is, a, test, to, remove, commas.
This is a, test, to, remove, commas.
This is a test, to, remove, commas.
This is a test to, remove, commas.
This is a test to remove, commas.
This is a test to remove commas.
$

现在分支命令只会在行中有逗号的情况下跳转，在最后一个逗号被删除后分支命令不会再执行。

测试命令t也可以改变sed编辑器脚本的执行流程，测试命令会根据替换命令的结果跳转到某个标签，而不是根据地址进行跳转，如果替换命令成功匹配并替换了一个模式，测试命令就会跳转到指定的标签，如果替换命令未能匹配到指定的模式，测试命令就不会跳转。[address]t [label]

跟分支命令一样，在没有指定标签的情况下，如果测试成功，sed会跳转到脚本的结尾。

举个例子，如果已经做了一个替换，不需要再做另一个替换，那么测试命令就能帮上忙：
1
2
3
4
5
6
$ sed 's/first/matched; t; s/This is the/No match on/' data2.txt
No match on header line
This is the matched data line
No match on second data line
No match on last line
$
设想下面这种情况：
1
2
3
$ echo "The cat sleeps in his hat." | sed 's/.at/".at"/g'
The ".at" sleeps in his ".at".
$
这显然不是我们想要的效果，用于替代的字符串无法匹配已匹配单词中的通配字符。

sed编辑器提供了一种解决方法，&符号可以用来代表替换命令中的匹配的模式，不管模式匹配的是什么样的文本，你都可以在替代模式中使用&符号来使用这段文本，这样就可以操作模式所匹配到的任何单词了：
1
2
3
$ echo "The cat sleeps in his hat." | sed 's/.at/"&"/g'
The "cat" sleeps in his "hat".
$
当模式匹配了单词cat,”cat”就会出现在了替换后的单词里。当它匹配了单词hat,”hat”就出现在了替换后的单词中。

sed编辑器用圆括号来定义替换模式中的子模式，你可以在替代模式中使用特殊字符来引用每个子模式，替代字符由反斜线和数字组成，数字表明子模式的位置，sed编辑器会给第一个子模式分配字符\1，给第二个子模式分配字符\2，以此类推。

需要注意的是当在替换命令中使用圆括号时，需要用转义字符将它们标识为分组字符而不是普通的圆括号：
1
2
3
$ echo "The System Administrator manual" | sed 's/(System) Administrator/\1 User'
The System User manual
$
如果需要用一个单词来替换一个短语，而这个单词刚好是该短语的子字符串，但那个子字符串碰巧使用了通配符，这时使用子模式会方便很多。
1
2
3
4
5
6
$ echo "That furry cat is pretty" | sed 's/furry (.at)/\1/'
That cat is pretty
$
$ echo "That furry hat is pretty" | sed 's/furry (.at)/\1/'
That hat is pretty
$
在这种情况下，你不能使用&符号，因为它会替换整个匹配的模式，这个时候就得使用子模式，允许你选择将模式中的某部分作为替代模式。

当需要在两个或多个子模式之间插入文本时，这个特性尤其有用，举个例子，使用子模式在大叔子中插入逗号：
1
2
3
$ echo "1234567" | sed '{:start; s/(.*[0-9])([0-9]{3})/\1,\2; t start}'
1,234,567
$
这个模式会查找两个子模式。第一个子模式是以数字结尾的任意长度的字符。第二个子模式是若干组三位数字。如果这个模式在文本中找到了,替代文本会在两个子模式之间加一个逗号,每个子模式都会通过其位置来标示。这个脚本使用测试命令来遍历这个数字,直到放置好所有的逗号。

在shell脚本中可以将普通的shell变量及参数和sed编辑器脚本一起使用，例如：
1
2
3
4
$ nano reverse.sh
#!/bin/bash
sed -n '{ 1!G ; h ; $p }' $1
$
这个脚本使用shell参数$1从命令行中提取第一个参数，这正是需要进行反转的文件名。现在你能在任何文件中轻松使用这个sed编辑器脚本，再不用每次在命令行上重新输入了。

可以在脚本中用 $()将sed编辑器命令的输出重定向到一个变量中,以备后用。下面的例子使用sed脚本来向数值计算结果添加逗号。

$ cat fact.sh
#!/bin/bash
#Add commas to number in factorial answer
#
factorial=1
counter=1
number=$1
#
while [ $counter -le $number ]
do
    factorial=$[ $factorial * $counter ]
    counter=$[ $counter + 1 ]
done
#
result=$(echo $factorial | sed '{
:start
s/\(.*[0-9]\)\([0-9]\{3\})/\1,\2/
t start
}')
#
echo "The result is $result"
#
$
$ ./fact.sh 20
The result is 2,432,902,008,176,640,000
$

有些bash命令也可以添加行号，但它们会另外加入一些东西（有可能是不需要的间隔），例如：

$ nl data2.txt
1 This is the
2 This is the
3 This is the
4 This is the
$
$ cat -n data2.txt
1 This is the
2 This is the
3 This is the
4 This is the
$

看下面这个程序:

$ sed '=' data2.txt
1
This is the header line.
2
This is the first data line.
3
This is the second data line.
4
This is the last line.
$

=号命令可以添加行号，但是行号显示在行的上面，需要结合sed命令将两者放在同一行。

[Linux] Shell编程第十六课

谢谢打赏